40 research outputs found
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Text detoxification has the potential to mitigate the harms of toxicity by
rephrasing text to remove offensive meaning, but subtle toxicity remains
challenging to tackle. We introduce MaRCo, a detoxification algorithm that
combines controllable generation and text rewriting methods using a Product of
Experts with autoencoder language models (LMs). MaRCo uses likelihoods under a
non-toxic LM (expert) and a toxic LM (anti-expert) to find candidate words to
mask and potentially replace. We evaluate our method on several subtle toxicity
and microaggressions datasets, and show that it not only outperforms baselines
on automatic metrics, but MaRCo's rewrites are preferred 2.1 more in
human evaluation. Its applicability to instances of subtle toxicity is
especially promising, demonstrating a path forward for addressing increasingly
elusive online hate
Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting
Most existing stylistic text rewriting methods and evaluation metrics operate
on a sentence level, but ignoring the broader context of the text can lead to
preferring generic, ambiguous, and incoherent rewrites. In this paper, we
investigate integrating the preceding textual context into both the
and stages of stylistic text
rewriting, and introduce a new composite contextual evaluation metric
that combines similarity to the original sentence with
contextual cohesiveness. We comparatively evaluate non-contextual and
contextual rewrites in formality, toxicity, and sentiment transfer tasks. Our
experiments show that humans significantly prefer contextual rewrites as more
fitting and natural over non-contextual ones, yet existing sentence-level
automatic metrics (e.g., ROUGE, SBERT) correlate poorly with human preferences
(=0--0.3). In contrast, human preferences are much better reflected by
both our novel (=0.7--0.9) as well as proposed
context-infused versions of common metrics (=0.4--0.7). Overall, our
findings highlight the importance of integrating context into the generation
and especially the evaluation stages of stylistic text rewriting.Comment: emnlp 2023 main camera read
Towards Countering Essentialism through Social Bias Reasoning
Essentialist beliefs (i.e., believing that members of the same group are
fundamentally alike) play a central role in social stereotypes and can lead to
harm when left unchallenged. In our work, we conduct exploratory studies into
the task of countering essentialist beliefs (e.g., ``liberals are stupid'').
Drawing on prior work from psychology and NLP, we construct five types of
counterstatements and conduct human studies on the effectiveness of these
different strategies. Our studies also investigate the role in choosing a
counterstatement of the level of explicitness with which an essentialist belief
is conveyed. We find that statements that broaden the scope of a stereotype
(e.g., to other groups, as in ``conservatives can also be stupid'') are the
most popular countering strategy. We conclude with a discussion of challenges
and open questions for future work in this area (e.g., improving factuality,
studying community-specific variation) and we emphasize the importance of work
at the intersection of NLP and psychology.Comment: Workshop on NLP for Positive Impact @ EMNLP 202
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Dogwhistles are coded expressions that simultaneously convey one meaning to a
broad audience and a second one, often hateful or provocative, to a narrow
in-group; they are deployed to evade both political repercussions and
algorithmic content moderation. For example, in the sentence 'we need to end
the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to
many, but secretly means 'Jewish' to a select few. We present the first
large-scale computational investigation of dogwhistles. We develop a typology
of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles
with rich contextual information and examples, and analyze their usage in
historical U.S. politicians' speeches. We then assess whether a large language
model (GPT-3) can identify dogwhistles and their meanings, and find that
GPT-3's performance varies widely across types of dogwhistles and targeted
groups. Finally, we show that harmful content containing dogwhistles avoids
toxicity detection, highlighting online risks of such coded language. This work
sheds light on the theoretical and applied importance of dogwhistles in both
NLP and computational social science, and provides resources for future
research in modeling dogwhistles and mitigating their online harms.Comment: ACL 2023, see https://dogwhistles.allen.ai/ for the glossary and
other material
ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
We present ATOMIC, an atlas of everyday commonsense reasoning, organized
through 877k textual descriptions of inferential knowledge. Compared to
existing resources that center around taxonomic knowledge, ATOMIC focuses on
inferential knowledge organized as typed if-then relations with variables
(e.g., "if X pays Y a compliment, then Y will likely return the compliment").
We propose nine if-then relation types to distinguish causes vs. effects,
agents vs. themes, voluntary vs. involuntary events, and actions vs. mental
states. By generatively training on the rich inferential knowledge described in
ATOMIC, we show that neural models can acquire simple commonsense capabilities
and reason about previously unseen events. Experimental results demonstrate
that multitask models that incorporate the hierarchical structure of if-then
relation types lead to more accurate inference compared to models trained in
isolation, as measured by both automatic and human evaluation.Comment: AAAI 2019 C
NLPositionality: Characterizing Design Biases of Datasets and Models
Design biases in NLP systems, such as performance differences for different
populations, often stem from their creator's positionality, i.e., views and
lived experiences shaped by identity and background. Despite the prevalence and
risks of design biases, they are hard to quantify because researcher, system,
and dataset positionality is often unobserved. We introduce NLPositionality, a
framework for characterizing design biases and quantifying the positionality of
NLP datasets and models. Our framework continuously collects annotations from a
diverse pool of volunteer participants on LabintheWild, and statistically
quantifies alignment with dataset labels and model predictions. We apply
NLPositionality to existing datasets and models for two tasks -- social
acceptability and hate speech detection. To date, we have collected 16,299
annotations in over a year for 600 instances from 1,096 annotators across 87
countries. We find that datasets and models align predominantly with Western,
White, college-educated, and younger populations. Additionally, certain groups,
such as non-binary people and non-native English speakers, are further
marginalized by datasets and models as they rank least in alignment across all
tasks. Finally, we draw from prior literature to discuss how researchers can
examine their own positionality and that of their datasets and models, opening
the door for more inclusive NLP systems.Comment: ACL 202